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Abstract. We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases 
using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. 
We propose to use a first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the 
subsequent instantiation of these schematic answers using a conventional relational DBMS. In this research note, 
we outline the main idea of this technique - using abstractions of databases and constrained clauses for deriving 
schematic answers. The proposed method can be directly used with regular RDB, including legacy databases. 
Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology. 

1 Introduction. 

1.1 Settings and motivation. 

Consider the following scenario. Suppose we have onf] relational database (RDB), one or more expressive knowl- 
edge bases (KB) for domains to which the data in the RDB is related (e. g., rule bases and/or ontologies), and, 
optionally, some mapping between the RDB schema and the logical language of the domains, i. e., a logical de- 
scription of the relations in the RDB to link them to the concepts and relations defined by the knowledge bases. 
We would like to be able to formulate queries logically and answer them w. r. t. the knowledge bases and the RDB 
treated virtually as a collection of ground atomic facts (e. g., by viewing each table row as a separate ground fact). 
To make this process efficient, we would like to use the modern RDB technology as much as possible by delegating 
as much work as possible to the RDBMS hosting the database. 

We propose a method to implement this scenario, based on the use of resolution for incremental transformation 
of semantic queries into sequences of SQL queries that can be directly evaluated on the RDB, and whose results 
provide answers to the original queries. 

We envisage two main applications for the proposed technology. 

Enhancing the interface to conventional relational databases. Flexible querying of conventional RDBs by non- 
programmer users is very problematic because real-life enterprise databases often have complex designs. Writing a 
correct query requires good understanding of technical details of the DB schema, such as table and attribute names, 
foreign key relationships, mailable fields, etc. So most of RDB querying by non-programmer users is done with 
preprogrammed parameterised queries, usually represented as forms of various kinds. 

Even when special methodologies are used, like Query-by-Example (see, e. g. (25)), that allow to hide some of 
the complexities of SQL and database designs from the end users, one important inherent limitation remains in 
force. Whereas mapping some domain concepts to the RDB schema elements may be easy, many other concepts 
may be much more difficult to map. For example, it is easy to select instances of the concept "student" if there 
is a table explicitly storing all students, but if the user wants to extract a list of all members of a department in a 
university, he may have to separately query different tables storing information about students, faculty and support 
staff (assuming that there is no table specifically storing members of all these kinds), and then create a union of the 
results. 

1 In principle, our approach can be extended to multiple heterogeneous and distributed databases, but in this research note 
assume, for simplicity, that we are dealing with just one DB. 



This example exposes well the root of the problem: mapping some domain concepts to the data is difficult because 
it requires application of the domain knowledge. In the example, the involved piece of domain knowledge is the fact 
that students, faculty and support staff are all department members, and the user has to apply it manually to obtain 
the required results. 

Semantic querying is based on automatic application of domain knowledge formalised in the form of, e. g., rules 
and ontological axioms. In this approach, DB programmers "semantically document" their DB designs by providing 
an explicit correspondence between the RDB schemas and domain terminologies, e. g., in the form of logical 
axioms. This alone allows an end user to formulate queries directly in the terminology of the domain, without 
even a slightest idea about how the underlying RDBs are structureqj. However, the biggest advantage comes from 
the fact that reasoning w. r. t. additional, completely external KBs can be employed to generate and justify some 
answers, which makes querying not just semantic, as in (27], but also deductive. In our current example, the user 
can provide, as a part of the query, some KB that links the relations of being a department member, being a student 
in the department, etc. In some application contexts, it is important to be able to use rather expressive KBs for such 
purposes. Rule-based KBs and expressive DL ontologies are of a special interest, especially in combination. 
Database application development can also benefit from the proposed technology. Currently, it is often difficult to 
change the structure of an existing corporate database if there are many applications relying on the particular schema 
being used. Our scenario suggests an attractive alternative - applications that only query the DB semantically need 
not be rewritten when the DB design changes, provided that a suitable semantic mapping can be written for the 
new design. In particular, new DBs can be added without changing the applications that will query them. With such 
decoupling of applications from the DB designs, it may also be easier to outsource the development of applications 
because exposing the details of DB designs to application developers becomes unnecessary. 

Web-scale semantic search. The Semantic Web is expected to contain a lot of data in the form of RDF and OWL 
descriptions referring to various formalised vocabularies - ontologies. In some cases the expressivity of RDF(S) 
and OWL may not be enough and knowledge bases in other formalisms, e. g., RuleML 17121 . RIF 1 1 1 or SWRL (3), 
have to be used to capture more complex dependencies between domain concepts and relations, thus making the 
data descriptions sufficiently semantically rich. 

The utility of the Semantic Web data will strongly depend on how easily and how efficiently users and agents can 
query it. Roughly speaking, we need to query extremely large volumes of highly distributed data modulo expressive 
knowledge bases, so that not only direct answers based on the stored data are returned, but also implied answers 
that can only be obtained by reasoning. 

The approach proposed here may be a part of a solution to this problem: large sets of RDF triples and OWL data 
descriptions (coming from Semantic Web documents) can be loaded into a relational database and then queried 
deductively modulo the relevant knowledge bases. Different DB layouts can be used, depending on the nature of the 
data being loaded. For example, if we load an OWL ABox, we can have a separate one-column table for keeping 
instances of each class and, similarly, a separate two-column table for keeping assertions of each propert)0. Loading 
data descriptions into an RDB is a linear operation, so it is unlikely to become a real performance bottleneck. 
Moreover, we can start producing answers even before the data is fully loaded. So the efficiency of such a scheme 
depends mostly on how efficiently the deductive querying on the RDB can be done. 

lust like text-based Web search engines do not indiscriminately scan all the accessible documents each time a 
new query is processed, semantic search systems cannot examine all accessible data descriptions in every retrieval 
attempt. Instead, some form of indexing is necessary that would allow to avoid downloading data that is irrelevant 
to a specific query, and would focus the processing on the sets of assertions that are likely to contribute to some 
answers to the query. We will show that the core feature of our approach to deductive querying of RDB - incremental 
query rewriting - suggests a natural way of semantically indexing distributed data sources. 



1.2 Outline of the proposed method. 

To implement the target scenario, we propose to use a first-order logic reasoner in combination with a conventional 
RDBMS, so that the reasoner does the "smart" part of the job, and the RDBMS is used for what it is best at - 
relatively simple processing of large volumes of relational data by computing table joins. Roughly, the reasoner 
works as a query preprocessor. It accepts a semantic query, the relevant knowledge bases and a semantic mapping 
for a DB as its input, and generates a (possibly infinite) number of expressions which we call schematic answer^ 

2 This does not alleviate the need for convenient query interfaces, but they are outside the scope of this research note. 

3 These is the scheme used in all examples throughout the research note. 

4 In earlier versions of this research note we used the term generic answers, which clashes with the classification proposed in (9). 
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that can be easily converted into SQL queries. These SQL queries are then evaluated on the DB with the help of the 
RDBMS. The union of the results for these SQL queries contains all answers to the original deductive query. 
This idea can be implemented with a relatively simple architecture as shown in Figure[T] The architecture introduces 
two main modules - a reasoner for finding schematic solutions and an SQL generator to turn these solutions into 
SQL queries. We also assume that some off-the-shelf RDBMS is used to answer the SQL queries. All three com- 
ponents (can) work in parallel: while the reasoner searches for another schematic answer, the SQL generator can 
process some previous general solutions and the RDBMS can generate instances for some earlier general solutions 
and communicate them to the user. 

Optionally, the reasoner may try to prune the search space by checking certain constraints over the RDB (details 
will be provided in Section[4}. These constraints are also converted into SQL queries and sent to the RDBMS for 
evaluation. The results of the evaluation (' satis fiable' or 'unsatisfiable') are sent back to the reasoner which 
can use the absence of solutions for a constraint as a justification for suppressing certain inferences. 
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Fig. 1. Architecture for deductive query answering 



The rest of this research note is structured as follows. In Section [2] we introduce the method intuitively. In Sec- 
tion[3]we provide a minimal mathematical justification of usability of our approach. More specifically, we demon- 
strate soundness and completeness of some standard resolution-based calculi for rewriting semantic queries into 
sequences of schematic answers. In Section|4]we describe one optimisation specific to schematic answer search. In 
Section[6]we briefly discuss how semantic indexing can be done using data abstractions, in the context of Web-scale 
retrieval. In Section[5]we provide an algorithm for converting the logical representation of schematic answers into 
SQL. Finally, Sections[7]and[8]briefly describe some related and future work. 



2 Informal method description. 

We model an RDB as a finite set of ground atomic formulas, so that RDB table names provide the predicates, and 
rows are conceptually treated as applications of the predicates to the row elements. In the example below, we have 
a table takesCourse from a University DB, keeping information about which student takes which course, whose 
rows are mapped to a set of facts. 



takesCourse 



student 

si 

s2 
s3 



course 

cl 

c2 
c3 



takesCourse(s 1 ,c 1 ) 
takesCourse(s2,c2) 
takesCourse(s3,c3) 



Before we proceed with more important things, note that in all our examples in this research note, the data is 
assumed to be a relational representation of some DL ABoxes. This is done not to clutter the presentation of the 
main ideas with RDB schema-related details. In particular, there is no need for a special RDB-to-KB mapping 
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because the RDB tables directly correspond to concepts and properties.lt bears repeating that this assumption is 
made only to simplify the presentation - our approach is applicable to any RDBs, including legacy ones, as long as 
their design allows reasonable semantic mapping. 

Now, suppose we are trying to answer a query over our RDB deductively, e. g., modulo some KB. 

Naive approach as a starting point. Hypothetically, we can explicitly represent the DB as a collection of ground 
atomic facts and use some resolution-based FOL reasoner supporting query answering, e.g., Vampire |26| or Gan- 
dalf l2jh . 

Even if we have enough memory to load the facts, this approach is likely to be very inefficient for the fol- 
lowing reason. If the RDB is large and the selectivity of the query is not very high, we can expect that 
many answers will be obtained with structurally identical proofs. For example, if our DB contains facts 
graduateStudent(si), . . . , graduateStudent(sioo) (representing some table graduateStudent which simply 
keeps a list of all graduate students), the facts will give rise to 100 answers to the query student(X~j^ each 
having a refutational proof of the form shown in Figure [2] (where grStud, takesC, pers and stud abbreviate 
graduateStudent, takesCourse, person and student, and skO is a Skolem function). 

This example is intended to demonstrate how wasteful reasoning on the per-answer basis is. Roughly speaking, 
the required amount of reasoning is multiplied with the number of answers. Even if the selectivity of the query is 
very high, the reasoner is still likely to waste a lot of work in unsuccessful attempts represented by derivations not 
leading to any answers. 

Note that these observations are not too specific to the choice of the reasoning method. For example, if we used 
Prolog or a tableaux-based DL reasoner, we would have a similar picture: the same rule applications would be 
performed for each answer Sj. 

Main idea. The main idea of our proposal is that answers with similar proofs should be obtained in bulk. More 
specifically, we propose to use reasoning to find schematic answers to queries, which can be later very efficiently 
instantiated by querying the RDB via the standard highly optimised RDBMS mechanisms. Technically, we pro- 
pose to search for the schematic answers by reasoning on an abstraction of the RDB in some resolution- and 
paramodulation-based calculus (see [5 21 1). The abstraction and the reasoning on the abstraction should be or- 
ganised in such a way that the obtained schematic answers can be turned into regular RDBMS queries (e.g., SQL 
queries). 

Constrained clauses and table abstractions. To illustrate our main idea, we apply it to the current example. The 
clause grStud(X) | grStud(X) is the abstraction of the relevant part of the RDB, i.e., it represents (generalises) 
all the facts grStud(si), . . . , grStud(sioo)- This is a very important feature of our approach, so we emphasise that 
a potentially very large set of facts is compactly represented with just one clause. The part before "|" is the ordinary 
logical content of the clause. What comes after "|" is a special constraint. These constraints will be inherited in all 
inference rules, instantiated with the corresponding unifiers and combined when they come from different premises, 
just like, e. g., ordering or unifiability constraints in paramodulation-based theorem proving |21|. Although our 
constraints can be used as regular constraints - that is to identify redundant inferences by checking the satisfiability 
of the associated constraints w.r.t. the RDB (see Section - their main purpose is to record which RDB fact 
abstractions contribute to a schematic answer and what conditions on the variables of the abstractions have to be 
checked when the schematic answer is instantiated, so that the obtained concrete answers are sound. 
A derivation of a schematic answer for the query student(X), covering all the concrete solutions X := 
s\, . . . ,X := sioo, is shown in Figure [3] Note that the last inference simply merges three identical atomic con- 
straints. Also note that we write the answer literals on the constraint sides of the clauses, because they are not 
intended for resolution. 

SQL generation. Semantically the derived schematic answer □ | -^answer (X), grStud(X) means that if some 
value x is in the table graduateStudent, then x is a legitimate concrete answer to the query. So, assuming that 
id is the (only) attribute in the RDB table representing the instances of graduateStudent, the derived schematic 
answer □ | -^answer (X) , grStud(X) can be turned into the following simple SQL query: 

SELECT id AS X 
FROM graduateStudent 

Evaluating this query over the RDB will return all the answers X := si, . . . , X := sioo- 
5 Query 6 from LUBM [16). 
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; part of grStud C 3takesC '.grCourse 
-igrStud(X) V grCourse(skO(X)) 



part of grStud C 3takesC.gr 'Course 
^grStud(X) V takesC(X, skO(X)) 



; DB row 
grStud(si) 




takesC(si, skO(si)) 



; pari o/ siud = pers n 3takesC .course 
-^takesC(X, Y) V -.cowse(Y) V -.pers(X) V stud(X) 



■ grStud C pers 
ngrStud(X) V pers(X) 



->course(skO(si)) V -ipers(s») V stud(si) 



; DB row 
grStud(si) 



; DB row 
grStud(si) 



grCourse(skO(si)) 



; grCourse C course 
->grCourse(X) V cowrse(X) 



co«rse(sfcO(si)) 



pers(s 



; query 
-^stud(X) V answer(X) 




answer(si) 

Fig. 2. Resoluton derivation of the answer X := Sj for the query stud(X). 



part of grStud C 3takesC.gr Course 
-^grStud(X) V grCourse(skO(X)) 



; part of grStud C 3takesC.gr Course 
-^grStud(X) V takesC(X, skO(X)) 



; DB table abstraction 
grStud(X) | grStud(X) 




takesC(X,skO(X)) \ grStud(X) 



; part o/ s£wd = pers n 3takesC. course 
->takesC(X,Y) V -icourse(Y) V ^pers(X) V stud(X) 



^course{skQ(X)) V -npers(X) V stud(X) \ grStud(X) 



; grStud C pers 
ngrStud(X) Vpers(X) 



; DB faMe abstraction 
grStud(X) | grStud(X) 



; DB £aMe abstraction 
grStud(X) | grStud(X) 



grCourse(skO(X)) | grStud(X) 



; grCourse C course 
-^grCourse(X) V cowse(X) 



coMrse(sfcO(X)) | grStud(X) 



pers(X) | grStud(X) -^pers(X) V stud(X) | grStud(X) , grStud(X) 

; query 

^stud(X) | ^answer(X) stud(X) \ grStud(X),grStud(X),grStud(X) 



□ | -nanswer(X),grStud(X),grStud(X),grStud(X) 



□ | -^answer(X), grStud(X) 
Fig. 3. Resolution derivation of some schematic answer for stud(X). 



Resolution reasoning on a DB abstraction may give rise to more than one schematic answer. For example, 
□ | -^answer(X), grStud(X) does not necessarily cover all possible solutions of the initial query - it only 
enumerates graduate students. If our KB also postulates that any person taking a course is a student, we want to 
select all such people as well. So, suppose that our DB also contains the facts person(Pi), . . . ,person(Pioo), 
takesCourse(Pi, Ci), . . . , takesCourse{Pwo, Cioo) and course(Ci), . . . , course (Cioo) in the cor- 
responding tables person, takesCourse and course. These relations can be represented with the 
abstraction clauses person(X) | person(X), takesCourse(X,Y) | takesCourse(X,Y) and 
course(X) \ course(X). Simple reasoning with these clauses modulo, say, a KB containing the rule 
student(P) : — person(P), takesCourse(P,C), course(C) or the DL axiom person n 3takesC. course C 
student, produces the schematic answer □ | -^answer(X),person(X),takesCourse(X,Y),course(Y). 
Semantically it means that if table takesCourse contains a record {student = s, course = c}, and tables 
person and course contain s and c correspondingly, then X := s is a legitimate concrete answer. Thus, the 
schematic answer can be turned into the following SQL query: 

SELECT person.id AS X 
FROM person, takesCourse, course 
WHERE person.id = takesCourse.student 
AND course. id = takesCourse. course 

The join conditions person.id = takesCourse. student and course. id = takesCourse. course reflect the 
fact that the corresponding arguments of the predicates in the constraint attached to the schematic answer are 
equal: e.g., the only argument of person, corresponding to person.id, and the first argument of takesCourse, 
corresponding to takesCourse. student, are both the same variable X. 

Incremental query rewriting. It bears repeating that, in general, resolution over DB abstractions in the form of 
constrained clauses may produce many, even infinitely many, schematic answers and, consequently, SQL queries. 
They are produced one by one, and the union of their answers covers the whole set of concrete answers to the query. 
If there is only a finite number of concrete answers, e. g., if the query allows concrete answers to contain only plain 
data items from the database, then all concrete answers are covered after some finite number of steps. In a sense, 
the original semantic query is rewriten as a sequence of SQL queries, so we call our technique incremental query 
rewriting. 

Benefits. The main advantage of the proposed scheme is the expressivity scalability. For example, in applications 
not requiring termination, the expressivity of the knowledge representation formalisms is only limited by the ex- 
pressivity of the full FOL0, although specialised treatment of various FOL fragments is likely to be essential for 
good performance. The use of such a powerful logic as FOL as the common platform also allows easy practical 
simultaneous use of heterogeneous knowledge bases, at least for some data retrieval tasks. In particular, it means 
that users can freely mix all kinds of OWL and RDFS ontologies with all kinds of (monotonic) declarative rule sets, 
e. g., in RuleML or SWRL. 

It is important that we don't pay too high a price in terms of performance, for the extra expressivity. The method 
has good data scalability: roughly, the cost of reasoning is not multiplied by the volume of data. Note also that we 
don't have to do any static conversion of the data into a different data model, e. g., RDF triples or OWL ABox - 
querying can be done on live databases via the hosting RDBMSs. All this makes our method potentially usable with 
very large databases in real-life settings. 

An additional advantage of our approach is that answers to semantic queries can be relatively easily given rigorous 
explanations. Roughly speaking, if we need to explain a concrete answer, we simply instantiate the derivation of 
the corresponding schematic answer by replacing DB table abstractions with concrete DB rows, and propagating 
this data through the derivation. Thus, we obtain a resolution proof of the answer, which can be relatively easily 
analysed or transformed into a more intuitive representation. 

3 Soundness and completness of schematic answer computation. 

So far we have only speculated that schematic answer search can be implemented based on resolution. In this section 
we are going to put it on a formal basis. We will show that in the context of FOL without equality some popular 

6 Complete methods for efficient schematic answer finding in FOL with equality are yet to be formulated and proved fomally (see 
the brief discussion in Section[8]l. 
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resolution-based methods can deliver the desired results. In particular, we will characterise a class of resolution- 
based calculi that are both sound and complete for query answering over database abstractions. 
We assume familiarity of the reader with the standard notions of first-order logic, such as terms, formulas, literals 
and clauses, substitutions, etc., and some key results, such as the Herbrand's theorem. Bibliographic references are 
provided for more specialised concepts and facts. 

3.1 Definitions. 

Deductive queries. In our settings, a deductive query is a triple (DB, KB, <p), where (i) the logical representation 
DB of some relational database is a set of ground atomic non-equality formulas, each representing a row in a table in 
the database, (ii) the knowledge base KB is a finite set of FOL axioms, corresponding to both the domain ontologies 
and semantic RDB schema mappings in our scenario, and (iii) the goal tp of the query is a construct of the form 
(Xi, . . . , Xfc){Yi, . . . , Y m )C, where C is a nonempty clause, k, m > 0, {Xi, . . . , X k , Yi, . . . , Y m } — vars(C), 
all Xi and Yi are pairwise distinct. We call Xi distinguished variables, and Yj undistinguished variables of the 
query. Intutively, the deductive query represents a request to find all Xi, such that there exist some Yj, such that 
ip(X, Y) is inconsistent with DB U KB. In other words, answers to the query refute cp rather than prove it. This 
convention is made for technical convenience. Users of our technology can work in terms of positive queries. 

Recording literals. In our settings, a clause with recording literal^ is a construct of the following form: C | 7, 
where C is a regular first-order clause, possibly empty, and 7 is a finite multiset of literals, possibly empty. We will 
say that the literals of 7 are recording literals. 

Semantically, C | Ai, . . . , A n is the same as the regular clause C V Ai V ... V A„, which will be denoted as 
Sem(C I Ai, . . . , X n ). All semantic relations between Sem(C | 7) and other formulas are transfered to C | 7. For 
example, when we say that C | 7 is implied by something, it means that Sem(C | 7) is implied, and vice versa. 
Regular clauses will be often identified with clauses with empty recording parts, i.e., we will not distinguish C from 

a 10. 

We say that a clause C' | 7' subsumes the clause C | 7 iff there is a substitution 6 that makes C'9 a submultiset of 
C, and ■y'O a submultiset of 7'. In this case we will also say that C' | 7' is a generalisation of G | 7. 

Concrete and schematic answers. We distinguish a special predicate symbol (00. A ground atomic for- 
mula @S(ti, . . . ,£fc) is a concrete answer to the deductive query (DB, KB, {Xi, . . . , Xk){Yi, . . . , Y m )C), 
if the clause C\Xi/tx, . . . , Xk/tk] is inconsistent with DB U KB or, equivalently, the formula 
3Yi . . . Y^ClXi/h,. . . , X k /t k ] is implied by DB U KB. 

We say that a clause □ | 7 is a schematic answer to a deductive query (DB, KB, (Xi, . . . , Xk)(Yi, . . . , Y m )C), 
if every atomic ground formula of the form . . . , t k ) implied by DB U {□ j 7}, is a concrete answer to the 

query. Every such concrete answer will be called an instance of the schematic answer. 

Database abstractions. In our settings, a finite set DB' of clauses of the form p(ti, ■ ■ ■ , tk) \ p(ti, . . . , tk) is an 
abstraction of the logical representation DB of a database if for every atomic formula p 6 DB, there is a clause 
p I p G DB' and a substitution 9, such that p'9 — p. Note that semantically all clauses in DB' are tautologies, 
because Sem(p(t 1 , . . . ,t k ) \ p(t\, . . . ,t k )) = p(t\, ...,t k )W -<p(ti, . . . ,t k ). 

The simplest kind of an abstraction for an RDB is the set of all clauses p(Xi, . . . , X k ) \ p(X\, . . . , X k ), where all 
Xi are pairwise distinct variables, and each p corresponds to a table in the RDB. Dealing with such an abstraction 
can be viewed as reasoning on the schema of the RDB. However, in principle, we can have more specific abstrac- 
tions. For example, if we know that the first column of our RDB table p contains only values a and b, we may choose 
to have two abstraction clauses: p(a, X2, . . . , X k ) \ p(a, X2, ■ . ■ , X k ) andp(6, X2, ■ . ■ , X k ) \ p(b, X2, . . . , X k 

Calculi. In this research note we only deal with calculi that are sound and complete variants of resolutiorPI (see, 
e. g., (5))- All inference rules in these calculi are of the form 

7 We prefer this to the more general term "constrained clause" because we want to emphasise the nature and the role of our 
constraints, and to avoid confusion with other kinds of constraints used in automated reasoning and logic programming. 

8 Corresponds to the predicate answer used in our previous examples. 

9 Moreover, we can have just one abstraction clause, e. g., p(X\ , . . . , X k ) \ p(Xi , . . . , X k ) , Xi € {a, b} with the additional ad 
hoc constraint X\ £ {o, b}, but this kind of optimisations is outside the scope of this research note. 

Paramodulation is also briefly discussed as a future research opportunity in Section[8] 
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Cl C*2 ■■■ Cn 

D 

where d and D are ordinary clauses, and n > 1. Most such rules have a substitution associated with them, which 
is required to unify some subexpressions in d, usually atoms of complementary literals. Rules in the calculi that 
are of interest to us can be easily extended to clauses with recording literals: 

d | 71 C 2 | 72 • ■ ■ C n | jn 
D | 7i0, 720, . . . , 7„0 

So, for example, here is the binary resolution rule extended to clauses with recording literals: 

C[ V A 1 7 i C 2 V | 72 

cie\/c 2 e 1 710,720 

where is the most general unifier of the atoms A and B. 

If a calculus R' is obtained by extending the rules of a calculus R to clauses with recording literals, we will simply 
say that R' is a calculus with recording literals and R is its projection to regular clauses. 

Apart from nonredundant inferences, resolution calculi used in practice usually include some admissible redundant 
inferences. Implementors have the freedom of performing or not performing such inferences without affecting the 
completeness of the reasoning process. However, for the purposes of this research note it is convinient to assume 
that calculi being considered only contain nonredundant inferences. This assumption does not affect generality. 
A calculus with recording literals is sound if Sem of the conclusion of every derivation is logically implied by 
the Sem images of the clauses in the leaves. It is obvious that a calculus with recording literals is sound if its 
projection to regular clauses is sound because recording literals are fully inherited. A calculus with recording literals 
is refutationally complete if its projection to regular clauses is refutationally complete, i.e., an empty clause can be 
derived from any unsatisfiable set of clauses. 

In this research note we will mention fully specified calculi to distinguish them from generic (parameterised) calculi. 
For example, the ordered binary resolution in general is not fully specified - it is a generic calculus parameterised 
by an order on literals. If we fix this parameter by specifying a concrete order, we obtain a fully specified calculus. 
We view a fully specified calculus as the set of all its elementary inferences. 

We say that a fully specified calculus R with recording literals is generalisation-tolerant if every inference in R is 
generalisation-tolerant. An elementary inference 

Cl | 71 C 2 | 72 ■■■ Cn | 7n 

D I S 

from the calculus R is generalisation-tolerant if for every generalisation C[ | 7^ of a premise d \ ji, the calculus R 
also contains an elementary inference of some generalisation D' \ 5' of D \ 5, where the premises are a submultiset 

of {Ci | 71, . . . , Ci-i | 7i_i, C- | 7-, C i+ i | 7 i+ i, . . . , C„\ 7„}. 

Unordered binary resolution and hyperresolution provide simple examples of generalisation-tolerant calculi. Their 
ordered versions using admissible orderings (see, e. g., [5 ]) also cause no problems because application of gener- 
alisation to a clause cannot make a maximal literal nonmaximal, because of the substitution property of admissible 
orderings: L\ > i 2 implies Li0 > L 2 0. Adding (negative) literal selection (see, e. g., |5|) requires some care. In 
general, if a literal is selected in a clause, its image, if it exists, in any generalisation should be selected too. Such 
selection functions are still possible. For example, we can select all negative literals that are maximal w. r. t. some 
ordering satisfying the substitution property. In this case, however, we can no longer restrict ourselves to selecting 
a single literal in a clause, because the ordering can only be partial. 

Note that such calculi are the main working horses in several efficient FOL reasoners, e. g., Vampire. 
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3.2 Soundness. 



Theorem 1. Suppose 7? is a sound fully specified calculus with recording literals. Consider a deductive query 
Q — (DB, KB, (Xi, . . . , X k )(Yi, . . . , Y m )C). Suppose DB' is an abstraction of DB. Suppose we can derive in 
R a clause □ | 7 from DB' U KB U {C | -i@(Xi,. . . , X k )}. Then □ | 7 is a schematic answer to Q. 

Proof. Suppose DB U {□ | 7} implies a ground formula . . . , tk). We have to show that @(ti, . . . , t k ) is a 
concrete answer to Q, i.e., DB U KB U {C[Xi/t\, . . . , Xk/tk]} is unsatisfiable. 

Since R is sound, □ | 7 is derived from DB' U KB U {C \ -i@(Xi , . . . , X k )} and DB' contains only clauses that 
are semantically tautologies, the clause □ | 7 is implied by KB U {C | -i@(-Xi, . . . , Xk)}. Under our assumption, 
this means that DB U KB U {C -i@ (Xi ,...,X k )} implies @ (t 1 , . . . , t k ) . Note that the predicate @ does not 
occur in DB, KB or C and, therefore, DB U KB U {C[Xi/ti, . . . , X k /t k ]} is unsatisfiable. 

3.3 Completeness. 

Theorem 2. Suppose R is a refutationally complete and generalisation-tolerant fully specified calculus with 
recording literals. Consider a deductive query Q — {DB,KB, (Xi, . . . ,X k )(Yi, . . . ,Y m )C). Suppose DB' 
is an abstraction of DB. Then, for every concrete answer @(ti,...,t k ) to Q one can derive in R from 
DB' U KB U {C I . . . , X k )} a clause □ | 7, such that @(ti, . . . , t k ) is an instance of the schematic 

answer □ | 7. 

Proof. The refutational completeness of R means that we can construct a refutation A of DB U KB U 
C[Xi /ti , . . . , Xk /t k ] . The main idea of this proof is that in a generalisation-tolerant calculus finding an answer to 
a query is not much more difficult than just proving the answer. Technically, we will convert A into a derivation of 
a schematic answer covering the concrete answer . . . , tk). 

Assume that pi, i £ [1 . . . p], are all the facts from DB that contribute to A (as leaves of the refutation). We can 
convert A into a derivation A' of a clause of the form □ | pi , . . . , p p , , . . . , -u4 n , where p, n > and all atoms 
Ai = @(ti, . . . ,t k ), from the clauses pi | pi, . . . , p v \ p m , C[Xi/ti, . . . ,X k /t k ] | ->@(£i, . . . ,t k ) and some 
clauses from KB. To this end, we simply add the recording literals in the corresponding leaves of A and propagate 
them all the way to the root. Obviously, DB U {□ | pi, ... , p m , ->Ai, . . . , ^i„} implies @(ti, . . . , t k ). 
To complete the proof, we will show that A' can be converted into a derivation of a generalisation □ | 7 for the clause 
□ I pi, ... , p m , ->A!, . . . , -^A n from DB' U KB U {C | -i@(Xi,. . . , X k )}. This is a corollary of a more general 
statement: if we can derive some clause D from clauses Ci , . . . , C q in R, and C[, . . . , C' q are some generalisations 
of those clauses, then there is a derivation from some of C[, . . . , C' q in R of some generalisation D' of D. This can 
be easily proved by induction on the complexity of the derivation, taking into account the generalisation-tolerance 
ofR. 

Finally, note that □ | 7 implies □ | pi, . . . , p m , ->^4i, . . . , -^A n , and therefore DBu{D \ 7} implies @(£i, . . . , t k ). 

4 Recording literals as search space pruning constraints. 

Let us make an important observation: some schematic answers to deductive queries cover no concrete answers. 
These schematic answers are useless and the work spent on their generation is wasted. We can address this problem 
by trying to block search directions that can only lead to such useless schematic answers. 

Suppose we are searching for schematic answers to (DB, KB, (Xi, . . . , X k )(Y\, . . . , Y m )C) by deriving con- 
sequences of DB' U KB U {C I -i@(Xl, . . . , X k )} in an appropriate calculus, where DB' is an abstraction of 
DB. 

4.1 Database abstraction literals. 

Suppose we have derived a clause E = D \ p[, . . . , p' p , ^Ai, . . . , ^A n where p > 0, n > 0, all the atoms Ai are 
of the form @(t\, . . . ,t k ) and all the literals p'j are inherited from the recording literals of clauses from DB' . We 
can treat p[, . . . , p' p as follows: if we can somehow establish that the constraint p[, . . . , p' p has no solutions w. r. t. 
DB, we can remove the clause E from the search space. A solution of p'i, . . . , p' p w. r. t. DB is a substitution 6, 
such that all p-0 £ DB. 
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Such a treatment can be justified with the following argument. It is obvious that if p'i , . . . , p' p has no solutions w. r. t. 
DB, then any more specific constraint p[a, . . . , p' p a, where a is some substitution, also has no solutions. Since all 
recording literals are fully inherited in the calculi we are dealing with, any clause derived from E and any other 
clauses, will have the same property. Therefore, any schematic answer □ | 7 whose derivation contains the clause, 
will contain in 7 a nonempty subconstraint without @, having no solutions w. r. t. DB. Thus, □ | 7 cannot cover 
any concrete answers because the non-@ part of the constraint 7 cannot be satisfied. 

To summarise, we can discard clauses like E without sacrificing the completeness w. r. t. concrete answers. Prac- 
tically, this can be done by converting p[, . . . ,p' p into an SQL query (similar to how it is done in Section [3] for 
schematic answers) and evaluating the query on the database - empty result set indicates absense of solutions w. r. t. 
DB. 

4.2 Answer literals. 

Suppose we have derived a schematic answer □ | D, -*Ai, ■ ■ ■ , ~^A n where D only contains database abstraction 
literals or is empty, and n > 0. For the schematic answer to have instances, the answer literals -1A4 must be simul- 
taneously unifiable. Indeed, suppose @(ti, . . . , tk) is an instance of the schematic answer. By Herbrand's theorem, 
DB U . . . , tk)} is inconsistent with a finite set of ground clauses of the form □ | D8, -^Ai9, . . . , ^A n 9. 

We assume that the set is minimal. It cannot be empty because @ does not occur in DB and DB itself is trivially 
consistent. Consider any clause □ | D8, ->A\0, . . . , -^A n 9 from the set. All the atoms Aid from this clause are 
equal to . . . , tk) because otherwise the set would not be minimal - any model of the set without this clause 
could be extended to make this clause true by making an appropriate AiB true. Thus, all Ai are simultaneously 
unifiable. 

The fact proved above can be used to prune the search space as follows: if we derive an intermediate clause with 
some ©-literals that are not simultaneously unifiable, we can discard the clause because any schematic answer 
derived from it will have no instances. Moreover, we can use the most general unifier for ©-literals to strengthen 
the test on database abstraction literals by applying the unifier to them before solving them on the database. 

5 SQL generation. 

Suppose that we have found a schematic answer □ | pi, . . . , p p , -1A1, ■ ■ ■ , ~<An to a query 
{DB , KB, {Xi, . . . , Xk){Yi, . . . ,Y m )C). Now our task is to enumerate all instances of the schematic 
answer by querying the relational database modeled by the fact set DB, with an SQL query. 
We have four cases to consider. (1) If p = n = 0, then we simply have a refutation of KB. Formally, this means 
that any ground @(ti, . . . ,tk) is a correct answer, but for practical purposes this is useless. Instead, we should 
simply inform the user about the inconsistency. (2) If p — but n / 0, we have to try to unify all the literals Ai. If 
9 = mgu(Ai, . . . , A n ), then the set of instances of the schematic answer coincides with the set of ground instances 
of Ai9. (3) If p 7^ but n — 0, there is a possibility that DB U KB is inconsistent. We may want to check this 
possibility by checking if pi, . . . , p p has solutions over DB - if it does, DB is inconsistent with KB. The check 
itself can be done by converting pi,.. ., p p into an SQL query as in the next case, and checking if an answer to the 
SQL query exists. (4) In the rest of this section we will be considering the most interesting case when p ^ and 
n / 0. 

5.1 Merging and flattening answer literals. 

In fact, we only need to consider the case when n = 1. Indeed, if □ | 7, where 7 = D, -1A1, . . . , ->A n is a 
schematic answer, it is only interesting to us if all Ai are simultaneously unifiable, as demonstrated in Section [4] 
So, suppose 6 = mgu(Ai, . . . , A n ). We are going to show that □ | 7 and □ | y9 cover the same sets of concrete 
answers. Suppose that @(ti, . . . , tk) is an instance of □ | 7. Keeping in mind the Herbrand's theorem, consider 
a minimal set of ground clauses of the form □ | y0j, j = 1 . . . m, inconsistent with DB U {-i@(ti, . . . , tk)}. 
As was shown in Section |4~2l all ©-literals in the clauses □ | j6j are equal to ->@(ii, . . . , tk). Assuming that 9 
only affects variables occuring in Ai, i. e., all other variables in 7 are only renamed by 9 (otherwise the unifier 
9 would not be the most general one), we can conclude that □ | 76 subsumes each of the clauses □ | j9j and, 
therefore, DB U {-i@(ti, . . . , tk)} U {□ | 7#} is unsatisfiable, which makes @(ti, . . . , tk) an instance of □ | "/9. 
The opposite direction is trivial. Now, since we can replace 7 with 7$, we can also replace it with the shorter 
constraint D9, -^Ai9. 
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We can make another simplifying assumption: we only have to deal with schematic answers of the form 
□ | D, . . . ,Xk), where Xi are pairwise distinct variables, each Xi occurs in D, and D contains only 

database abstraction literals. If we need to enumerate instances of □ | D, -^A where A is a more complex CD-literal, 
we enumerate instances of □ | D, n@(Xi, . . . , X k ), where {X\, . . . , Xk} = vars(A) n vars(D), and for every 
such instance @(ti, . . . ,tt) we construct all instances A[Xi/ti, . . . ,Xk/tk]o of the original schematic answer, 
where the substitutions a can instantiate variables not occuring in D with arbitrary ground terms. In practice, we 
can explicitly report such variables as universally quantified in the answers, which is the approach adopted by most, 
if not all, Prolog implementations. 

5.2 Flattening the database abstraction literals. 

Recall that all facts in DB are of the form ri(a\, . . .), where the predicates ri correspond to tables in a relational 
database and all are constants. This and the considerations from Section |4~T| justify the following assumption: 
literals from D do not contain compound terms, i. e., all their arguments are variables or constants. If this condition 
is false, the schematic answer is simply useless because D has no solutions w. r. t. DB. 

One final transformation of schematic answers is needed to make the SQL query generation straightforward. 
Namely, we can represent the schematic answer with a semantically equivalent clause of the form E a V E c V 
Ed V D x V A, where (i) A = @(Xi, . . . ,Xk) and all answer variables Xi are pairwise distinct; (ii) D x = 
-iri(Yi", . . . , Yk(i)) V ... V -ir p (Y 1 p , . . . , Yj v) and all variables Yj are pairwise distinct; (iii) E a consists of k 
negative equality literals a* gk Xi, i = 1 . . . k, where cii G {Yi , . . . , Y^ p ^}; (iv) E c consists of zero or more 
negative equality literals of the form a gk /3, where a G {Y x , . . . , Y^f, , } and (3 is a constant; (v) Ed consists of 
zero or more negative equality literals of the form a gk /3, where a, (3 G {Yj , . . . , YX p , }. 
Here is a sketch of an algorithm for the transformation. Suppose we initially have 

hi (tl,-.. ,*£(!)) V ... V -rv(t?,...,^ (jj) ) V @(Xx,...,X k ), which is just 
Sem(D j n(ii, • • • , ifc(i)), ■ • ■ , r p (t^, ■ ■ ■ , ^fc( p ))) ^@(Xi, . . . , Xk))- Recall that all tj are variables or constants. 
We transform it into the equivalent clause E V -in (Yi 1 , . . . , Y^i)) V . . . V ->r p (Yf , . . . , Yg.-. ) V @(Xi ,...,X k ) 
where E consists of all literals of the form YJ ^t). Now E c can be easily extracted from E by taking all Y? gk tj 
where are constants. E a is obtained by taking one literal of the form Y, ! X e for each answer variable X e . Note 
that we have some nondeterminism here because the same answer variable can occur in more than one literal in E. 
Finally, Ed is obtained by computing the set {Yj ^ Y™\ for some nonconstant t, t jk Y*, t gk Y£ G E} and 
removing redundant literals from it. In this context, a literal YJ gk YJ 1 is redundant if there are two literals YJ ^ Y s r 
and Ys gk YJ 1 (modulo the symmetry of the equality predicate). Note that this process is also nondeterministic 
because the result depends on the order of removing redundant literals. 

5.3 Forming the SQL query. 

The transformed schematic answer E a V E c V Ed V D x V A can be translated into an SQL query of the form 
SELECT (columns) FROM (tables) WHERE (join conditions). 

The expression (columns) is a comma-separated list of answer column declarations of the form Ri-#j AS X e 
for each YJ gk X e G E a . Here Ri is a fresh table alias corresponing to the literal n(Yi,..., Yh^ ) and #j denotes 
the j-th attribute name in the table ri from our RDB schema. 

The expression (tables) is a comma-separated list of table aliases of the form ri AS Ri, i = 1 ... p. We have to 
use aliases because, in general, some ri may coincide. 

The expression (join conditions) is a conjunction of elementary join condition of two kinds: (i) = j3 for 

each Yj gk (3 e E c , and (ii) J?j.#j = Ru.#v for each Yj gk Y„" G E d . 

6 A note on indexing Semantic Web documents with data abstractions. 

In the context of Semantic Web (SW), it is important to be able to index distributed semantic data description sets 
(SW documents, for simplicity), so that, given a semantic query modulo some knowledge bases, we can load only 
the SW documents that are potentially relevant to the query. In this section we briefly sketch a possible scheme for 
such indexing that is compatible with our approach to deductive querying. 

Conventional search engines index regular Web documents by words appearing in them. We cannot simply follow 
this example by indexing SW documents by the names of objects, concepts and relations occuring in them. This 
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is so because retrieval in general may require reasoning, and thus the relevant documents may use no common 
symbols with the query. For example, a query may request to find animals of bright colours. If some SW document 
describes, e. g., pink elephants, it is relevant, but lexically there is no overlap with the query. Only reasoning reveals 
the relation between ' http://zooontology.Org/concept#elephant ' and ' http://zooontology.Org/concept#animal ', and 
between ' http://www.colors.0rg/concept#pink ' and ' http://www.colors.0rg/concept#bright_colour '. 
Note that conceptually there is hardly any difference between RDBs and, say, OWL data description sets based 
on the Web: an RDB can be modeled as a set of ground atomic logical assertions, and, practically, an SW doc- 
ument is such a set. So, just like we use abstractions to represent relational data compactly in reasoning, we 
can use abstractions to represent SW documents. For example, a potentially large SW document introducing 
many pink elephants can be compactly represented by its abstraction zocr.elephant(X) | zoo-.elephant(X), 
colors:hasColour(X,Y) \ color s:hasColour(X,Y) and color s:pink(X) \ color s:pink(X). 
It seems natural to use such abstraction clauses as indexes to the corresponding SW documents in a semantic search 
engine. Then, the query answering process can be organised as follows. As in the case of reasoning over RDB 
abstractions, a reasoner is used to derive schematic answers to a given query, based on all available abstractions of 
indexed SW documents. Each schematic answer to the query depends on some abstraction clauses. The documents 
associated with these clauses are potentially relevant to our query, so we download them, and only them, into our 
local RDB for further processing. 

Of course, the indexing scheme presented here is just a conceptual one. The developers have the flex- 
ibility to chose a concrete representation - for example, they may just index by the URIs of concepts 
and relations, and only create the corresponding abstraction clauses when the reasoner is ready to inject 
them in the search space. There is also a possibility of adjusting the degree of generality of abstraction 
clauses by adding some ad hoc constraints. For example, the first of the abstraction clauses from the exam- 
ple above can be replaced with the more specific zoo:elephant(X) \ zoo-.elephant(X) , pref(X,"htpp : 
I /www.myelephants.com/"). The ad hoc constraint pref(X,"htpp : / /www.myelephants.com/") re- 
quires the prefix of the URI X to be "htpp://www.myelephants.com/". The constraint is incompatible with, e. g., 
prefix' 1 htpp : / /www. myrhinos.com/"), so if our reasoner derives a clause with these two constraints, it can 
safely discard it, thus improving the precision of indexing. 

7 Related work. 

We are not aware of any work that uses resolution-based reasoning in a way similar to the one proposed in this 
research note, i. e., for incremental query rewriting based on the use of complete query answering over database 
abstractions, implemented with constraints over the concrete data. 

In general, semantic access to relational databases is not a new concept. Some of the work on this topic is limited to 
semantic access to, or semantic interpretation of relational data in terms of Description Logic-based ontologies or 
RDF (see, e. g., 1 10 6 4| ), or non-logical semantic schemas (see |27|). There is also a large number of projects and 
publications on the use of RDB for storing and querying large RDF and OWL datasets: see, e. g., [ 24 17I11I12IT31 . 
to mention just a few. The format of the research note does not allow us to give a comprehensive overview of such 
work, so we will concentrate on research that tries to go beyond the expressivity of DL and, at the same time, is 
applicable to legacy relational databases. 

The work presented here was originally inspired by the XSTONE project |30|. In XSTONE, a resolution-based 
theorem prover (a reimplementation of Gandalf, which is, in particular, optimised for taxonomic reasoning) is 
integrated with an RDBMS by loading rows from a database as ground facts into the reasoner and using them to 
answer queries with resolution. The system is highly scalable in terms of expressiveness: it accepts full FOL with 
some useful extensions, and also has parsers for RDF, RDFS and OWL. We believe that our approach has better 
data scalability and can cope with very large databases which are beyond the reach of XSTONE, mostly because 
our approach obtains answers in bulk, and also due to the way we use highly-optimised RDBMS. 
Papers 1 23 1 and 1 22 1 describe, albeit rather superficially, a set of tools for mapping relational databases into OWL 
and semantic querying of the RDB. Importantly, the queries are formulated as SWRL |3| rule bases. Although 
SWRL only allows Horn rules built with OWL concepts, properties and equality, its expressivity is already sufficient 
for many applications. Given a semantic query in the form of a SWRL rule base, the software generates SQL queries 
in order to extract some relevant data in the form of OWL assertions and runs a rule engine on this data to generate 
final answers. So the reasoning is, at least partially, done on a per-answer basis, which gives us hope that our 
approach can scale up better. 

Another project, OntoGrate 1 14], uses an approach to deductive query answering, which is based on the same ideas 
as ours: their FOL reasoner, OntoEngine 1151 , can be used to rewrite original queries formulated in terms of some 
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ontology, into a finite set of conjunctive queries in terms of the DB schema, which is then converted to SQL. For 
this task, the reasoner uses backward chaining with Generalised Modus Ponens [28], which corresponds to negative 
hyperresolution on Horn clauses in the more common terminology. A somewhat ad hoc form of term rewriting |21 1 
is used to deal with equality. Termination is implemented by setting some limits on chaining, which allows them 
to avoid incremental processing. We hope to go much further, mainly, but not only, by putting our work on a solid 
theoretical foundation. In particular, we are paying attention to completeness. Since our approach is based on well- 
studied calculi, we hope to exploit the large amount of previous research on completeness and termination, which 
seems very difficult to do with the approach taken by OntoEngine. Althouth we are very likely to make various 
concessions to pragmatics, we would like to do this in a controllable and reproducible manner. 
On the more theoretical side, it is necessary to mention two other connections. The idea of using constraints to 
represent schematic answers is borrowed from Constraint Logic Programming 1 1 8 1 and Constrained Resolution 1 8 1 . 
Also, the general idea of using reasoning for preprocessing expressive queries into a database-related formalism, 
was borrowed from 1201 , where a resolution- and paramodulation-based calculus is used to translate expressive DL 
ontologies into Disjunctive Datalog. This work also shares a starting point with ours - the observation that reasoning 
methods that treat individuals/data values separately can not scale up sufficiently. 

8 Future work. 

Our future work will be mostly concentrated in the following directions: 

Equality treatment. If equality is present in our knowledge bases (e. g., in the form of OWL number restrictions), 
we can extend the standard superposition calculus to clauses with recording literals as we did with resolution. 
However, the completeness proof does not easily transfer to such use of superposition. Therefore, one of our main 
priorities now is to look for adjustments of the superposition calculus that would be provably complete w. r. t. 
schematic answers, without being too inefficient. An obvious obstacle to generalisation-tolerance is the absence of 
paramodulations into variables in the standard paramodulation-based calculi, so, for a start, we will try to use the 
specificity of reasoning over DB abstractions to eliminate the need for such inferences in generalisation-tolerant 
variants of superposition. 

Completeness with redundancy deletion. Static completeness, proven in Section [3] is enough to guarantee that 
we will find all necessary answers only if our search procedure generates absolutely all possible derivations in the 
given calculus. In practice, such approach is almost always inefficient. Typically, some criteria are applied to detect 
redundant clauses and remove them from the current clause set (see, e. g., 151211 ). 

It seems relatively easy to prove completeness of schematic answer derivation process in presense of the most 
important redundancy deletion technique: roughly, a clause subsumed by another clause can be deleted from the 
current clause set. The main idea for such a proof is that if subsumption removes an answer derivation from the 
search space, the search space will still contain a structurally simpler derivation of the same answer or a more 
general answer. Note that this is a property of generalisation-tolerant calculi. However, if we want to deal with 
equality efficiently, we have to demonstrate compatibility of our approach with the standard redundancy criterion 
(see, e. g., 151211 ). 

Termination. Very often it is desirable that a query answering implementation terminates on a given query having 
exhausted all solutions, e. g., for counting and aggregation of other kinds. We are interested in identifying combi- 
nations of practically relevant fragments of FOL with reasoning methods and strategies, that guarantee termination. 

Implementation and experiments. A proof-of-concept implementation has been already created, based on a ver- 
sion of the Vampire prover [26], and two experiments were done - one on a large instance of the LUBM benchmark 
1161 and another one on the BioCyc 1191 dataset (in OWL). This preliminary work will be used to guide a more 
substantial implementation effort including an implementation of a front-end for all monotonic sublanguages of 
Derivation RuleML |2|, an implementation of a client-server Java API and tuning the reasoner for the task of 
schematic answer derivation over RDB abstractions. 
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